Converting between data frames and contingency tables

Table of contents

Converting between data frames and contingency tables
- Problem
- Solution
  - expand.dft() function

Problem

You want to do convert between a data frame of cases, a data frame of counts of each type of case, and a contingency table.

Solution

Suppose we start with this data frame, in which each row represents one case:

cases <- data.frame(Sex=c("M", "M", "F", "F", "F"), 
                    Color=c("brown", "blue", "brown", "brown", "brown"))
# Sex Color
#   M brown
#   M  blue
#   F brown
#   F brown
#   F brown

It can also be represented as a contingency table. Note that it's converted here and stored in ctable:

# Cases to Table
ctable  <- table(cases)
#    Color
# Sex blue brown
#   F    0     3
#   M    1     1

# If you call table using two vectors, it will not add names (Sex and Color) to the dimensions
table(cases$Sex, cases$Color)
#    blue brown
#  F    0     3
#  M    1     1

# The dimension names can be specified manually, or by using a subset of the data frame that
# contains only the desired columns
table(cases$Sex, cases$Color, dnn=c("Sex","Color"))
table(cases[,c("Sex","Color")])
#    Color
# Sex blue brown
#   F    0     3
#   M    1     1

It can also be represented as a data frame of counts of each combination. Note that it's converted here and stored in countdf:

# Cases to Counts
countdf <- as.data.frame(table(cases), stringsAsFactors=TRUE)
# Sex Color Freq
#   F  blue    0
#   M  blue    1
#   F brown    3
#   M brown    1

These three data structures represent the same information, but in different ways. Here are other ways of converting between them. Some of these require a function expand.dft(), which is defined below.

Converting from a contingency table to the other two formats:

# Table to Counts
as.data.frame(ctable, stringsAsFactors=TRUE)

# Table to Cases
expand.dft(as.data.frame(ctable, stringsAsFactors=T))

Converting from a data frame of counts to the other two formats:

# Counts to Cases
expand.dft(countdf)

# Counts to Table
xtabs(Freq ~ x+y, data=countdf)

expand.dft() function

expand.dft <- function(x, na.strings = "NA", as.is = FALSE, dec = ".") {
    # Take each row in the source data frame table and replicate it
    # using the Freq value
    DF <- sapply(1:nrow(x), 
                 function(i) x[rep(i, each = x$Freq[i]), ],
                 simplify = FALSE)

    # Take the above list and rbind it to create a single DF
    # Also subset the result to eliminate the Freq column
    DF <- subset(do.call("rbind", DF), select = -Freq)

    # Now apply type.convert to the character coerced factor columns  
    # to facilitate data type selection for each column 
    for (i in 1:ncol(DF)) {
        DF[[i]] <- type.convert(as.character(DF[[i]]),
                                na.strings = na.strings,
                                as.is = as.is, dec = dec)
    }

    DF
}

This function was written by Marc Schwartz.